Locomotion
Meta-Reinforcement Learning of Structured Exploration Strategies
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Exploration is a fundamental challenge in reinforcement learning (RL). Many current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we study how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm - model agnostic exploration with structured noise (MAESN) - to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation.
Meet the 3D-printed robot that walks without electronics
Researchers at Bioinspired Robotics and Design Lab, UC San Diego created a fully 3D-printed, six-legged robot that walks using compressed air. It has no electronics, motors, or batteries--just soft actuators powered by gas. Tested on various terrains, it operates continuously with a steady air supply.
AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control
Xu, Tianyu, Cheng, Yaoyu, Shen, Pinxi, Zhao, Lin
-- Quadrupedal robots can learn versatile locomotion skills but remain vulnerable when one or more joints lose power . In contrast, dogs and cats can adopt limping gaits when injured, demonstrating their remarkable ability to adapt to physical conditions. Inspired by such adaptability, this paper presents Action Learner (AcL), a novel teacher-student reinforcement learning framework that enables quadrupeds to autonomously adapt their gait for stable walking under multiple joint faults. Unlike conventional teacher-student approaches that enforce strict imitation, AcL leverages teacher policies to generate style rewards, guiding the student policy without requiring precise replication. We train multiple teacher policies, each corresponding to a different fault condition, and subsequently distill them into a single student policy with an encoder-decoder architecture. While prior works primarily address single-joint faults, AcL enables quadrupeds to walk with up to four faulty joints across one or two legs, autonomously switching between different limping gaits when faults occur . Quadruped robots are gaining popularity as versatile mobile platforms capable of navigating diverse terrains and performing robust locomotion tasks such as search and rescue operations in buildings, cargo delivery in cities, and planetary exploration. In such scenarios, quadrupeds may encounter faults that cannot be immediately repaired, requiring them to continue their tasks despite the malfunction.
Fancy humanoid robot no longer walks like it urgently needs a toilet
Human-looking bipedal robots can already run, jump, breakdance, punch, and generally perform broad feats of athletic prowess most humans could only dream of. One thing they are still pretty bad at though is walking a straight line without looking like they are moments away from soiling themselves. Figure AI, one of the buzziest startups in the humanoid robot space, now says it has engineered a solution to help address their machine's stiff shuffle-step. The more natural-looking stride was achieved by analyzing thousands of virtual humanoid robots walking simultaneously in a simulated digital environment, Figure explained in a recent blog post. The company used reinforcement learning, rewarding the virtual robots for actions like synchronized arm swings, heel strikes, and toe-offs (when the toe leaves the ground) that more closely resemble human movement.
TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion
Mousa, Amr, Karavis, Neil, Caprio, Michele, Pan, Wei, Allmendinger, Richard
-- Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between privileged teacher and proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaption; lead to poor generalization in real-world scenarios. We propose T eacher-Aligned Representations via Contrastive Learning (T AR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "T eacher". Results showed accelerated training by 2 compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40% on average compared to existing methods. Open-source code and videos are available at https://ammousa.github.io/TARLoco/.
Behavioral Conflict Avoidance Between Humans and Quadruped Robots in Shared Environments
Wei, Shuang, Zhang, Muhua, Gan, Yun, Huang, Deqing, Ma, Lei, Yang, Chenguang
Nowadays, robots are increasingly operated in environments shared with humans, where conflicts between human and robot behaviors may compromise safety. This paper presents a proactive behavioral conflict avoidance framework based on the principle of adaptation to trends for quadruped robots that not only ensures the robot's safety but also minimizes interference with human activities. It can proactively avoid potential conflicts with approaching humans or other dynamic objects, whether the robot is stationary or in motion, then swiftly resume its tasks once the conflict subsides. An enhanced approach is proposed to achieve precise human detection and tracking on vibratory robot platform equipped with low-cost hybrid solid-state LiDAR. When potential conflict detected, the robot selects an avoidance point and executes an evasion maneuver before resuming its task. This approach contrasts with conventional methods that remain goal-driven, often resulting in aggressive behaviors, such as forcibly bypassing obstacles and causing conflicts or becoming stuck in deadlock scenarios. The selection of avoidance points is achieved by integrating static and dynamic obstacle to generate a potential field map. The robot then searches for feasible regions within this map and determines the optimal avoidance point using an evaluation function. Experimental results demonstrate that the framework significantly reduces interference with human activities, enhances the safety of both robots and persons.
Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions
Zhang, Muhua, Ma, Lei, Wu, Ying, Shen, Kai, Sun, Yongkui, Leung, Henry
This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion and global RRT pruning upon map updates eliminate unreachable frontiers, reducing potential collisions and deadlocks. Adaptive sampling density adjustments, informed by obstacle distribution, enhance exploration coverage potential. For frontier point navigation, a stepwise consistent motion strategy is adopted, wherein the robot strictly drives straight on approximately equidistant line segments in the polyline path and rotates in place at segment junctions. This simplified, decoupled motion pattern improves scan-matching stability and mitigates map drift. For process control, the framework serializes frontier point selection and navigation, avoiding oscillation caused by frequent goal changes in conventional parallelized processes. The waypoint retracing mechanism is introduced to generate repeated observations, triggering loop closure detection and backend optimization in graph-based SLAM, thereby improving map consistency and precision. Experiments in both simulation and real-world scenarios validate the effectiveness of the framework. It achieves improved mapping coverage and precision in more challenging environments compared to baseline 2D exploration algorithms. It also shows robustness in supporting resource-constrained robot platforms and maintaining mapping consistency across various LiDAR field-of-view (FoV) configurations.
Transferable Latent-to-Latent Locomotion Policy for Efficient and Versatile Motion Control of Diverse Legged Robots
Zheng, Ziang, Zhan, Guojian, Shuai, Bin, Qin, Shengtao, Li, Jiangtao, Zhang, Tao, Li, Shengbo Eben
Reinforcement learning (RL) has demonstrated remarkable capability in acquiring robot skills, but learning each new skill still requires substantial data collection for training. The pretrain-and-finetune paradigm offers a promising approach for efficiently adapting to new robot entities and tasks. Inspired by the idea that acquired knowledge can accelerate learning new tasks with the same robot and help a new robot master a trained task, we propose a latent training framework where a transferable latent-to-latent locomotion policy is pretrained alongside diverse task-specific observation encoders and action decoders. This policy in latent space processes encoded latent observations to generate latent actions to be decoded, with the potential to learn general abstract motion skills. To retain essential information for decision-making and control, we introduce a diffusion recovery module that minimizes information reconstruction loss during pretrain stage. During fine-tune stage, the pretrained latent-to-latent locomotion policy remains fixed, while only the lightweight task-specific encoder and decoder are optimized for efficient adaptation. Our method allows a robot to leverage its own prior experience across different tasks as well as the experience of other morphologically diverse robots to accelerate adaptation. We validate our approach through extensive simulations and real-world experiments, demonstrating that the pretrained latent-to-latent locomotion policy effectively generalizes to new robot entities and tasks with improved efficiency.